5 research outputs found

    Minimisation of Multiplicity Tree Automata

    Full text link
    We consider the problem of minimising the number of states in a multiplicity tree automaton over the field of rational numbers. We give a minimisation algorithm that runs in polynomial time assuming unit-cost arithmetic. We also show that a polynomial bound in the standard Turing model would require a breakthrough in the complexity of polynomial identity testing by proving that the latter problem is logspace equivalent to the decision version of minimisation. The developed techniques also improve the state of the art in multiplicity word automata: we give an NC algorithm for minimising multiplicity word automata. Finally, we consider the minimal consistency problem: does there exist an automaton with nn states that is consistent with a given finite sample of weight-labelled words or trees? We show that this decision problem is complete for the existential theory of the rationals, both for words and for trees of a fixed alphabet rank.Comment: Paper to be published in Logical Methods in Computer Science. Minor editing changes from previous versio

    Learning via automaton minimization and matrix factorization

    No full text
    This thesis focuses on weighted-automaton learning and minimization, matrix factorization, and the relationships between these areas. The first part of the thesis studies minimization and learning of weighted automata with weights in a field, focusing on tree automata as representations of distributions over trees. We give a minimization algorithm that runs in polynomial time assuming unit-cost arithmetic, and show that a polynomial bound in the Turing model would require a breakthrough in the complexity of polynomial identity testing. We also improve the complexity of minimizing weighted word automata. Secondly, we look at learning minimal weighted automata in both active and passive learning frameworks, analysing both the computational and query complexity. For active learning, we give a new algorithm for learning weighted tree automata in Angluin's exact learning model that efficiently processes DAG counterexamples, and show a complexity lower bound in terms of polynomial identity testing. For passive learning, we characterise the complexity of finding a minimal weighted automaton that is consistent with a set of observations. The second part of the thesis studies nonnegative matrix factorization (NMF), a powerful dimension-reduction and feature-extraction technique with numerous applications in machine learning and other scientific disciplines. Despite being an important technique in practice, there are a number of open questions about the theory of NMF---especially about its complexity and the existence of good heuristics. We answer a major open problem posed in 1993 by Cohen and Rothblum: the nonnegative ranks over the rationals and over the reals differ. We first tackle a variant of the problem, restricted NMF, which has applications in topic modelling and a geometric interpretation as the nested polytope problem. Lastly we show that NMF is polynomial-time reducible to automaton minimization, and then use our results on NMF to answer old open problems on minimizing labelled Markov chains and probabilistic automata.</p

    Learning via automaton minimization and matrix factorization

    No full text
    This thesis focuses on weighted-automaton learning and minimization, matrix factorization, and the relationships between these areas. The first part of the thesis studies minimization and learning of weighted automata with weights in a field, focusing on tree automata as representations of distributions over trees. We give a minimization algorithm that runs in polynomial time assuming unit-cost arithmetic, and show that a polynomial bound in the Turing model would require a breakthrough in the complexity of polynomial identity testing. We also improve the complexity of minimizing weighted word automata. Secondly, we look at learning minimal weighted automata in both active and passive learning frameworks, analysing both the computational and query complexity. For active learning, we give a new algorithm for learning weighted tree automata in Angluin's exact learning model that efficiently processes DAG counterexamples, and show a complexity lower bound in terms of polynomial identity testing. For passive learning, we characterise the complexity of finding a minimal weighted automaton that is consistent with a set of observations. The second part of the thesis studies nonnegative matrix factorization (NMF), a powerful dimension-reduction and feature-extraction technique with numerous applications in machine learning and other scientific disciplines. Despite being an important technique in practice, there are a number of open questions about the theory of NMF---especially about its complexity and the existence of good heuristics. We answer a major open problem posed in 1993 by Cohen and Rothblum: the nonnegative ranks over the rationals and over the reals differ. We first tackle a variant of the problem, restricted NMF, which has applications in topic modelling and a geometric interpretation as the nested polytope problem. Lastly we show that NMF is polynomial-time reducible to automaton minimization, and then use our results on NMF to answer old open problems on minimizing labelled Markov chains and probabilistic automata.</p

    Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma.

    Get PDF
    Concentrations of liver enzymes in plasma are widely used as indicators of liver disease. We carried out a genome-wide association study in 61,089 individuals, identifying 42 loci associated with concentrations of liver enzymes in plasma, of which 32 are new associations (P = 10(-8) to P = 10(-190)). We used functional genomic approaches including metabonomic profiling and gene expression analyses to identify probable candidate genes at these regions. We identified 69 candidate genes, including genes involved in biliary transport (ATP8B1 and ABCB11), glucose, carbohydrate and lipid metabolism (FADS1, FADS2, GCKR, JMJD1C, HNF1A, MLXIPL, PNPLA3, PPP1R3B, SLC2A2 and TRIB1), glycoprotein biosynthesis and cell surface glycobiology (ABO, ASGR1, FUT2, GPLD1 and ST3GAL4), inflammation and immunity (CD276, CDH6, GCKR, HNF1A, HPR, ITGA1, RORA and STAT4) and glutathione metabolism (GSTT1, GSTT2 and GGT), as well as several genes of uncertain or unknown function (including ABHD12, EFHD1, EFNA1, EPHA2, MICAL3 and ZNF827). Our results provide new insight into genetic mechanisms and pathways influencing markers of liver function
    corecore